ON NEAR(EST) CORRELATION MATRIX 



PASHA ZUSMANOVICH 



Abstract. We present an elementary heuristic reasoning based on Arnold's theory of versal 
deformations in support of a straightforward algorithm for finding a correlation matrix near 
the given symmetric one. 



Introduction 

Bankers are interested in correlations between time series associated with various finan- 
cial instruments (such as prices of stocks, options, futures and other derivatives, currency 
exchange rates, etc.), presented in the form of a sample correlation matrix. As a bona fide 
correlation matrix, it should be positive semidefinite. In practice, however, the computed 
matrix almost always turns out to be not positive semidefinite. The main reason for this is 
twofold: methodological errors (taking data for different instruments in different time ranges, 
inconsistent approach to inventing of missing data), and floating point rounding errors. 

The computed correlation matrix is utilized, however, in further analysis, like evaluation 
of various risks; for this, its positive semidefiniteness is crucial. As in the most cases it is 
Ph I impossible to backtrack the origin of the problem due to shortage of time, the large amount of 
numerical data (a typical scenario may involve daily computed correlation matrices reaching 
the size of ten thousands by ten thousands), and complexity of the methods used in its 
retrieval, processing and storage, one usually resorts on "correcting" the symmetric matrix 
at hand to make it positive semidefinite. 

Naturally, this "correction" should be as small as possible. So, a practical problem arises: 
for a given symmetric matrix, find the nearest, in some sense, correlation matrix. A quick 
glance at the literature (mentioned below) suggests that this problem arises not only in 
£sj ; banking. 

Not surprising then that this problem attracted a considerable attention. While the exact 
expression for the nearest correlation matrix is not available, many papers - see [BHJ, |QXX| 
and references therein - contain algorithms for its determination. These algorithms utilize 
methods from convex analysis, semismooth optimization, and other sophisticated branches 
of numerical mathematics. Earlier results in this direction are also surveyed in |Ge[ §9.4.6]. 
In all these works, "nearest" is understood in the sense of the Frobenius matrix norm, or 
some its (weighted) variation. 

In the real life, however, bankers tend to ignore all this wisdom and implement a very 
pedestrian approach to this problem (sometimes called "shrinking" and which can be found, 
with some variations, in JDI], [QXXl , [RMj . [Gel Exercise 9.14], and in many other places). 
Namely, in the spectral decomposition A = BJB T of a given n x n symmetric matrix A, 
where B is an orthogonal matrix of eigenvectors, and 
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is a diagonal matrix of eigenvalues of A, replace all negative eigenvalues by some small positive 
number e: 



A; 



if Xi < 
if A, > 0, 



for i — 1, . . . , n (in practice, zero eigenvalues do not occur). The resulting matrix 
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is a positive definite covariance matrix, and its normalization 
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auajj/ij=i 



is declared to be the requested correlation matrix, allegedly close to the initial matrix A. 

This pedestrian approach turns out to be very efficient in practice (in all banking numerical 
examples we have observed, the initial and corrected matrices were very close with respect to 
the max nor and no discrepancies occurred utilizing the corrected matrix in the subsequent 
analysis). In this note we offer a heuristic argument explaining this, perhaps, unreasonable at 
the first glance, efficiency. The argument, presented in §|2j is an easy application of Arnold's 
theory of versal deformation of matrices. A fragment of the theory needed for our purposes 
is briefly recalled in §TTJ 



1. Arnold's theory of versal deformations 



In 1971, Vladimir Arnold developed a theory of versal deformations of matrices, which 
triggered a wake of subsequent work. The original paper [A] is still the best exposition of 
this theory. The main result of this theory can be formulated in many different ways, one of 
them runs as follows. 

Let A be a complex nxn matrix with distinct eigenvalues Ai, A2, • • • , A&, and with a Jordan 
normal form 
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* In iJ5J we use at a certain place submultiplicativity (i.e., ||AB|| < ||A||||B|| for any two matrices A, B) of 
the matrix norm || • || measuring the "nearness". The max norm is not submultiplicative, so, formally, it does 
not fit those arguments. This can be remedied, however, by a minor (and well-known) fix: the max norm 
becomes submultiplicative when multiplied by the matrix size (see, for example, [HJi p. 292]). Even taking 
into account this factor (< 10 5 in practice), the absolute values of differences between the corresponding 
elements of the initial and corrected matrices remained very small in all examples we have seen. 
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is one Jordan block of size i x £, and 
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consists of all Jordan blocks of sizes ni x ni,n 2 x n 2 , ■ ■ ■ ,n m x n m , arranged in the non- 
increasing order (i.e., ni > n% > • • • > n m ), corresponding to a single eigenvalue A with 
algebraic multiplicity ni + n 2 + ■ ■ ■ + n m . 
Let us define a parametric deformation 

J j £l2, • • • j CliVu £21, ^22, • ■ ■ 1 £2^21 • • • j £fcl> 6c2i • • • j £fciV fe ) 

of J, with complex parameters £11,... ,£kN k , where 

Ni = nn + 3n i2 + 5n i3 H h (2mj - l)n imi 

for i = 1, . . . , k. 

First, all blocks corresponding to different eigenvalues are deformed independently: 

n,...,ni mi (Cll) • • • ) £lJVi) \ 

^n 2 2 i,...,n 2 m 2 (^21) • • ■ j6jV 3 ) 



Second, a single Jordan block J} is deformed as follows: 



V 





J nL...,n km Atkl,---,£kN k )J 



Jeixi,- ■ ■ ,Xi) 



fx 1 

A 


\Xl X2 



\ 



A 1 

X1-1 X + xiJ 



and, finally, the deformation Jn u n 2 ,...,n m (Xi, Xm+Zn 2 +-+{2m-X)n m ) of a11 blocks correspond- 
ing to a single eigenvalue A is defined in the following recursive way: 
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Then, according to O Theorem 4.4], any smooth family of complex n x n matrices con- 
taining A, and parametrized by several complex variables t = t 2 , . . . ), can be represented, 
in a sufficiently small neighborhood of = (0, 0, . . . ), as the product 

(3) B(£i(t), . . . , 6v(t)) J(Cr(t), . . . , £v(t)) S(6(t), • • • , ^(t))- 1 , 

where iV = ^i=i -^i> an &' s are smooth functions of their arguments vanishing at 0, B(£\, . . . , £/v) 
is a smooth family of invertible matrices, and 

A = A(0) = B(0) J(0)B(0)-\ 
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If the spectrum of A is simple, than this picture is significantly streamlined. The total 
number N of parameterizing functions in ([3]) is equal to n, the size of the matrix, and the 
deformation family of the diagonal matrix ([T]) itself consists of diagonal matrices: 



Ai+a(t) 



J(£l(t),...,£n(t)) 
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where &(0) — 0, i — 1, . . . , n. 

There are corresponding results for matrices with real coefficients [GaJ and symmetric 
matrices |PRj (as well as for many other situations in which a Lie group acts on a manifold, 
see [3]), which are technically more complicated. However, for our purpose it suffices to use 
Arnold's original setting. Just note that as we are interested solely in symmetric matrices 
which are brought to the diagonal form ([1]) by an orthogonal transformation, the combination 
of results of [A] and [PRJ shows that in the decomposition ([3]) we may assume that all matrices 
in the family B are orthogonal, i.e. 
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for all t from an appropriate neighborhood of zero. 



2. Just getting rid of negative eigenvalues is enough 

It is well-known that the set of the correlation matrices coincides with the set of (real) 
positive semidefinite matrices with units on the main diagonal (see, for example, [FJ Chapter 
III, §6, Theorem 4]), so we will use that two notions interchangingly. 

Suppose A is a symmetric n x n matrix with units on the main diagonal. As the set of 
matrices with a simple spectrum is Zariski-dense in the set of all real n x n matrices, we may 
assume that A has a simple spectrum (a more down-to-earth incarnation of this fact is that all 
correlation matrices appearing in banking practice, and, more generally, correlation matrices 
based on a sufficiently large amount of real- world data, have simple spectrum; in fact, the 
reasonings below could be modified for the case of arbitrary spectrum, but technically they 
would become more complicated). Let (JT|) be its Jordan normal form, all Aj's being pairwise 
distinct (and some of them are negative, of small absolute value). 

Suppose further that there exists a correlation matrix C "near" A, and that A and C 
are members of a smooth family of matrices. The latter assumption is justified both from 
theoretical (correlation matrix is a smooth function of time series it correlates between) and 
practical (the financial processes a correlation matrix is trying to capture, are assumed to be 
satisfactorily modelled by smooth functions) perspectives. 

According to the theory presented in in a sufficiently small neighborhood % of A = 
A(0), we may write this smooth family in the following parametric form: 



(4) A(t) = B(t) 
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for some smooth functions £j such that £i(0) = for all % = 1, . . .,n, and a smooth family 
B(t) = of orthogonal matrices. In particular, C, being a member of the family, 

is represented in the form (J4J) for some value t = t . 

The condition of positive definiteness of a member of the family A(t) is equivalent to 

(5) &(t) > -A, 
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where B o2 (t) = (^(t) 2 ^ is the Hadamard square of B(t). The set of solutions of <$5§ is 
nonempty (it contains at least two points, and to), hence it forms a hypersurface Jf? in the 
space of parameters, and the intersection of this hypersurface with the neighborhood % and 
the open domain defined by conditions ([5]), defines a certain neighborhood of C = A(t ) 
in J^. 

In terms of the procedure described in the introduction, getting C from A amounts to 
"adjusting" eigenvalues, i.e., adding to each eigenvalue Aj in the diagonal form ([T]) a small 
correction £j(t ), and subsequent "normalization" { 
correlation matrix 
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Assuming that the neighborhood is small enough, any matrix from it will do, but what 
will be the best choice? As mentioned in the introduction, this is, generally, a difficult problem 
not admitting a closed-form solution. Intuitively, there is no need to adjust the positive 
eigenvalues, but only the negative ones, and the following imprecise reasoning supports this. 

Assuming that the matrix norm || • || measuring the "nearness" is sub multiplicative and 
is invariant under transposition (the latter assumption is not essential but slightly simplifies 
the expressions below), we have: 

\\A(t) - A(0)\\ 

B(t) diag (Ax + £i(t), . . . , A n + £ n (t))5(t) T - 5(0) diag (Ai, . . . , A n )5(0) T || 
diag (ei(t),...,e„(t))||||B(t)-B(0)|| 2 
diag (ei(t),...,e„(t))||||B(t)-B(0)||||S(0)|| 
diag (ei(t),...,£ n (t))||||S(0)|| 2 
B(t) - B(0) 
B(t) - B(0) 
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| 2 || diag (Ai, . . . , A„) || 
||| J B(0)||||diag(A 1 ,...,A n )| 



Both theoretical considerations in [AJ, and computational procedures developed in [MJ sug- 
gest that matrix entries of the parametric family B(t) providing the transformation to the 
canonical form (jl]) of the versal deformation, have, as power series of the parameter t, the 
same order of magnitude as matrix entries of the canonical form itself. In particular, in a 
sufficiently small neighborhood of zero, which can be assumed lying inside 

\\B(t) - 5(0)|| < a|| diag (£i(t),...,a(t))|| 

for some (positive) constant a. This, together with (j7j), implies that || A(t) — A(0) \\ is bounded 
by a cubic polynomial in || diag (£i(t), ■ ■ ■ , £n(t)) || with positive coefficients. The latter poly- 
nomial is a monotonic function, so to minimize ||A(t) — A(0)|| one may wish to minimize 
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|| diag (£i(t), . . . , Cn(t)) || instead. Subject to restriction (j3]), for all matrix norms appearing 
in practice, this amounts to setting £j(t) to a positive value "just a little bit" bigger than 
— Aj if Aj is negative, and to zero otherwise. 

We stress that these are merely non-rigorous, heuristic, arguments, and by no means they 
can substitute a rigorous analysis given in |BHj . |QXX| and similar papers. However, these 
arguments perfectly suit the practical nature of the problem: one knows a priori that a very 
close correlation matrix exists. In such a situation, Arnold's theory guarantees existence of 
such matrix in the simple form (j3J). Though it is not guaranteed that this will be the nearest 
correlation matrix, it certainly will be a near one, and this suffices in practice. 

Of course, arguments of this sort can be used in other similar situations - for example, 
to justify adjusting ("cutoff") of some unwanted, from the physical perspective, eigenvalues 
of (valid) correlation matrices arising in lattice gauge theory (see |YJJL] and references 
therein), or correcting the degenerate covariance matrix from an insufficient amount of data 
in the situation when the number of observations is much less then the number of variables 
(see |TW] and references therein). 
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